Accelerating Phase Based Motion Estimation with Hierarchical Search Technique Using Parallel Threading in Graphical Processing Unit (GPU)

نویسندگان

  • Rosa A. Asmara
  • M. Hariadi
چکیده

This paper presents Phase Only Correlation (POC) methods in hierarchical search motion estimation for high resolution digital video using Graphical Processing Unit (GPU). Using the POC function, one can estimate the translational displacement as well as the degree of similarity between two image blocks from the location and height of the correlation peak, respectively[1]. Motion Estimation is a process for defining object movement in digital video sequences. Motion Estimation is a system used in some field such as image processing, image analysis, video coding, and computer vision. A POC based hierarchical search is a high cost algorithm results in long processing time, thus the system developed in this paper proceed POC function in Graphical Processing Unit using parallel threading technology. The evaluation counts processing time speed of the methods using Graphical Processing Unit in high definition video with 1280 x 720 pixel resolution. The results show that the methods using GPU performs accelerating speed more than two times faster processing 2 layer hierarchical search in 256x256 POC block size than doing the same methods using CPU. Using the NVidia GeForce 9600GT GPU, kernel execution with 256 thread per block, 9 32-bit register per thread, and 36 bytes of memory shared for every thread block, the multiprocessor maximum occupancy is 100%, with 768 active threads per multiprocessor, 24 Active Warps per multiprocessor, and 3 active thread blocks per multiprosessor.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)

Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...

متن کامل

Parallel Implementation of Particle Swarm Optimization Variants Using Graphics Processing Unit Platform

There are different variants of Particle Swarm Optimization (PSO) algorithm such as Adaptive Particle Swarm Optimization (APSO) and Particle Swarm Optimization with an Aging Leader and Challengers (ALC-PSO). These algorithms improve the performance of PSO in terms of finding the best solution and accelerating the convergence speed. However, these algorithms are computationally intensive. The go...

متن کامل

Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards

In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displac...

متن کامل

Ultra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU

Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study u...

متن کامل

Multiprocessing GPU Acceleration of H.264/AVC Motion Estimation under CUDA Architecture

Abstract— This work presents a parallel GPU-based solution for the Motion Estimation (ME) process in a video encoding system. We propose a way to partition the steps of Full Search block matching algorithm in the CUDA architecture, and to compare the performance with a theoretical model and two implementations (sequential and parallel using OpenMP library). We obtained a O(n2/log2n) speed-up wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009